Exploiting Value Locality to Exceed the Dataflow Limit Corresponding
نویسندگان
چکیده
The serialization constraints imposed by true data dependences have always been regarded as an absolute dataflow limit on the parallel execution of serial programs. This paper describes value prediction, a new technique that allows data dependent instructions to issue and execute in parallel without violating program semantics. This technique exploits value locality, or the likelihood of the recurrence of a previously-seen value within a storage location inside a computer system. Value prediction consists of predicting entire 32and 64-bit register values based on previously-seen values. We find that values loaded from memory or generated by ALU instructions are frequently predictable. Furthermore, we show that simple microarchitectural enhancements to a modern microprocessor implementation based on the PowerPC 620 that enable value prediction can effectively exploit value locality to collapse true dependences, reduce average memory and result latencies, and provide average performance gains of 3%-23% by exceeding the dataflow limit.
منابع مشابه
Experimental Characterization of Value Locality
A number of recent publications have described various value prediction techniques for exploiting value locality in the data-flow portion of a processor to improve program performance. An examination of the value usage characteristics of programs is appropriate and relevant because it is within the value space that value prediction mechanisms operate. In this paper, we perform an extensive expe...
متن کاملExploiting Locality of Array Data with Parallel Object-Oriented Model for Multithreaded Computation
I-structure was designed to achieve efficiency and parallelism in functional programs that manipulate large data structures. Most multithreading models based on dataflow use it and it is put in a global heap memory that is shared by all code blocks. In this case, we can not effectively exploit the locality of data structure in most scientific application programs in which the production and con...
متن کاملHigh Performance Microprocessor Design Methods Exploiting Information Locality and Data Redundancy for Lower Area Cost and Power Consumption
Value predictor predicting result of instruction before real execution to exceed the data flow limit, redundant operation table removing redundant computation dynamically, and asynchronous bus avoiding clock synchronization problem have been proposed as high performance microprocessor design methods. However, these methods increase area cost and power consumption problems because of the larger ...
متن کاملScalable Locality-Sensitive Hashing for Similarity Search in High-Dimensional, Large-Scale Multimedia Datasets
Similarity search is critical for many database applications, including the increasingly popular online services for Content-Based Multimedia Retrieval (CBMR). These services, which include image search engines, must handle an overwhelming volume of data, while keeping low response times. Thus, scalability is imperative for similarity search in Webscale applications, but most existing methods a...
متن کاملMapping Applications to a Coarse Grain Reconfigurable System
This paper introduces a method which can be used to map applications written in a high level source language program, like C, to a coarse grain reconfigurable architecture, MONTIUM. The source code is first translated into a control dataflow graph. Then after applying graph clustering, scheduling and allocation on this control dataflow graph, it can be mapped onto the target architecture. The c...
متن کامل